Overview
pipeline_utils.py is a centralized utilities module that provides shared constants, helper functions, and common configurations used across all EDL Pipeline scripts. This promotes code reusability, consistency, and easier maintenance.
Purpose
This utility module serves to:- Centralize Configuration: Single source of truth for common settings
- Reduce Code Duplication: Shared functions used by multiple scripts
- Improve Maintainability: Update headers/settings in one place
- Enhance Security: User-agent rotation to avoid detection
How Scripts Use It
Multiple pipeline scripts import and use these utilities:fetch_all_indices.py- Usesget_headers()fetch_fundamental_data.py- Usesget_headers(include_origin=True)fetch_technical_data.py- Usesget_headers()- Any future pipeline scripts requiring standardized headers
API Reference
Constants
Absolute path to the directory containing the EDL Pipeline scripts.Usage:Value:
List of browser user-agent strings for rotation to avoid detection.Contents:
- Chrome on Windows 10
- Chrome on macOS
- Chrome on Linux
- Firefox on Windows 10
- Safari on macOS
Functions
get_headers()
Generates standard HTTP headers with a randomly selected user-agent for API requests.
Whether to include
Origin and Referer headers for CORS compliance.False: Returns basic headers (Content-Type, User-Agent, Accept)True: Adds Origin and Referer headers pointing to scanx.dhan.co
dict
A dictionary containing HTTP headers.
Return Structure (include_origin=False):
Source Code
Design Decisions
Why User-Agent Rotation?
Why User-Agent Rotation?
Many APIs implement rate limiting or blocking based on user-agent strings. By rotating between different browser user-agents, the pipeline mimics organic traffic patterns and reduces the likelihood of being flagged or blocked.
Why Optional Origin Headers?
Why Optional Origin Headers?
Some endpoints require CORS (Cross-Origin Resource Sharing) headers (
Origin and Referer) to validate requests, while others don’t. The include_origin parameter provides flexibility without code duplication.Why BASE_DIR Constant?
Why BASE_DIR Constant?
Having a centralized base directory path allows scripts to reference relative paths consistently, making the pipeline portable across different environments without hardcoded paths.
Best Practices
Always Import from pipeline_utils
Instead of hardcoding headers in each script, always import Avoid:
get_headers() to ensure consistency and benefit from updates.Good:Use include_origin for ScanX API
When calling ScanX endpoints (dhan.co domains), use
include_origin=True to ensure CORS compliance.Extend, Don't Duplicate
If you need additional utility functions, add them to
pipeline_utils.py instead of creating separate utility files.Dependencies
os: File path operationsrandom: Random selection of user agents
This module has no external dependencies beyond Python standard library, making it lightweight and portable.
Future Enhancements
Potential additions to this utility module:- Centralized logging configuration
- Retry logic with exponential backoff
- Environment variable management
- Common data validation functions
- Shared file I/O helpers